{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Contextual Retrieval With Llama Index\n",
    "\n",
    "This notebook covers contextual retrieval with llama_index DocumentContextExtractor\n",
    "\n",
    "Based on an Anthropic [blost post](https://www.anthropic.com/news/contextual-retrieval), the concept is to:\n",
    "1. Use an LLM to generate a 'context' for each chunk based on the entire document\n",
    "2. embed the chunk + context together\n",
    "3. reap the benefits of higher RAG accuracy\n",
    "\n",
    "While you can also do this manually, the DocumentContextExtractor offers a lot of convenience and error handling, plus you can integrate it into your llama_index pipelines! Let's get started.\n",
    "\n",
    "NOTE: This notebook costs about $0.02 everytime you run it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Install Packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index\n",
    "%pip install llama-index-readers-file\n",
    "%pip install llama-index-embeddings-huggingface\n",
    "%pip install llama-index-llms-openai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Setup an LLM\n",
    "You can use the MockLLM or you can use a real LLM of your choice here. flash 2 and gpt-4o-mini work well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.openai import OpenAI\n",
    "from llama_index.core import Settings\n",
    "\n",
    "OPENAI_API_KEY = \"sk-...\"\n",
    "llm = OpenAI(model=\"gpt-4o-mini\", api_key=OPENAI_API_KEY)\n",
    "Settings.llm = llm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " # Setup a data pipeline\n",
    "\n",
    " we'll need an embedding model, an index store, a vectore store, and a way to split tokens."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Build Pipeline & Index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/loganmarkewich/Library/Caches/pypoetry/virtualenvs/llama-index-caVs7DDe-py3.10/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "from llama_index.core import VectorStoreIndex, StorageContext\n",
    "from llama_index.core.node_parser import TokenTextSplitter\n",
    "from llama_index.core.storage.docstore.simple_docstore import (\n",
    "    SimpleDocumentStore,\n",
    ")\n",
    "from llama_index.embeddings.huggingface import HuggingFaceEmbedding\n",
    "\n",
    "# Initialize document store and embedding model\n",
    "docstore = SimpleDocumentStore()\n",
    "embed_model = HuggingFaceEmbedding(model_name=\"baai/bge-small-en-v1.5\")\n",
    "\n",
    "# Create storage contexts\n",
    "storage_context = StorageContext.from_defaults(docstore=docstore)\n",
    "storage_context_no_extra_context = StorageContext.from_defaults()\n",
    "text_splitter = TokenTextSplitter(\n",
    "    separator=\" \", chunk_size=256, chunk_overlap=10\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### DocumentContextExtractor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# This is the new part!\n",
    "\n",
    "from llama_index.core.extractors import DocumentContextExtractor\n",
    "\n",
    "context_extractor = DocumentContextExtractor(\n",
    "    # these 2 are mandatory\n",
    "    docstore=docstore,\n",
    "    max_context_length=128000,\n",
    "    # below are optional\n",
    "    llm=llm,  # default to Settings.llm\n",
    "    oversized_document_strategy=\"warn\",\n",
    "    max_output_tokens=100,\n",
    "    key=\"context\",\n",
    "    prompt=DocumentContextExtractor.SUCCINCT_CONTEXT_PROMPT,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Load Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget \"https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay_ambiguated.txt\" -O \"paul_graham_essay_ambiguated.txt\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import SimpleDirectoryReader\n",
    "\n",
    "reader = SimpleDirectoryReader(\n",
    "    input_files=[\"./paul_graham_essay_ambiguated.txt\"]\n",
    ")\n",
    "documents = reader.load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Run the pipeline, then search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 15/15 [00:07<00:00,  2.10it/s]\n"
     ]
    }
   ],
   "source": [
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()\n",
    "\n",
    "# need to add documents directly for the DocumentContextExtractor to work\n",
    "storage_context.docstore.add_documents(documents)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents=documents,\n",
    "    storage_context=storage_context,\n",
    "    embed_model=embed_model,\n",
    "    transformations=[text_splitter, context_extractor],\n",
    ")\n",
    "\n",
    "index_nocontext = VectorStoreIndex.from_documents(\n",
    "    documents=documents,\n",
    "    storage_context=storage_context_no_extra_context,\n",
    "    embed_model=embed_model,\n",
    "    transformations=[text_splitter],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_question = \"Which chunks of text discuss the IBM 704?\"\n",
    "retriever = index.as_retriever(similarity_top_k=2)\n",
    "nodes_fromcontext = retriever.retrieve(test_question)\n",
    "\n",
    "retriever_nocontext = index_nocontext.as_retriever(similarity_top_k=2)\n",
    "nodes_nocontext = retriever_nocontext.retrieve(test_question)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "==========\n",
      "NO CONTEXT\n",
      "\n",
      "Chunk 1:\n",
      "Score: 0.5710870309825231\n",
      "Content: it. The result would ordinarily be to print something on the spectacularly loud device.  I was puzzled by the machine. I couldn't figure out what to do with it. And in retrospect there's not much I could have done with it. The only form of input to programs was data stored on cards, and I didn't have any information stored on them. The only other option was to do things that didn't rely on any input, like calculate approximations of pi, but I didn't know enough math to do anything interesting of that type. So I'm not surprised I can't remember any code I wrote, because it can't have done much. My clearest memory is of the moment I learned it was possible for programs not to terminate, when one of mine didn't. On a machine without time-sharing, this was a social as well as a technical error, as the manager's expression made clear. With microcomputers, everything changed. Now you could have one sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punched inputs\n",
      "\n",
      "Chunk 2:\n",
      "Score: 0.567587387219806\n",
      "Content: McCarthy's 1960 paper.\n",
      "But if so there's no reason to suppose that this is the limit of the language that might be known to them. Presumably aliens need numbers and errors and I/O too. So it seems likely there exists at least one path out of McCarthy's Lisp along which discoveredness is preserved.\n",
      "Thanks to Trevor Blackwell, John Collison, Patrick Collison, Daniel Gackle, Ralph Hazell, Jessica Livingston, Robert Morris, and Harj Taggar for reading drafts of this.\n",
      "==========\n",
      "WITH CONTEXT\n",
      "\n",
      "Chunk 1:\n",
      "Score: 0.6776241992281743\n",
      "Content: it. The result would ordinarily be to print something on the spectacularly loud device.  I was puzzled by the machine. I couldn't figure out what to do with it. And in retrospect there's not much I could have done with it. The only form of input to programs was data stored on cards, and I didn't have any information stored on them. The only other option was to do things that didn't rely on any input, like calculate approximations of pi, but I didn't know enough math to do anything interesting of that type. So I'm not surprised I can't remember any code I wrote, because it can't have done much. My clearest memory is of the moment I learned it was possible for programs not to terminate, when one of mine didn't. On a machine without time-sharing, this was a social as well as a technical error, as the manager's expression made clear. With microcomputers, everything changed. Now you could have one sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punched inputs\n",
      "\n",
      "Chunk 2:\n",
      "Score: 0.6200645958839048\n",
      "Content: Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. They were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep. The first programs I tried writing were on the IBM 1401 that our school district used for what was then called \"data processing.\" This was in 9th grade, so I was 13 or 14. The district's machine happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. The space was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights. The language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in the reader and press a button to load the code into memory and run it. The result would ordinarily be to print something\n"
     ]
    }
   ],
   "source": [
    "# Print each node's content\n",
    "print(\"==========\")\n",
    "print(\"NO CONTEXT\")\n",
    "for i, node in enumerate(nodes_nocontext, 1):\n",
    "    print(f\"\\nChunk {i}:\")\n",
    "    print(f\"Score: {node.score}\")  # Similarity score\n",
    "    print(f\"Content: {node.node.text}\")  # The actual text content\n",
    "\n",
    "# Print each node's content\n",
    "print(\"==========\")\n",
    "print(\"WITH CONTEXT\")\n",
    "for i, node in enumerate(nodes_fromcontext, 1):\n",
    "    print(f\"\\nChunk {i}:\")\n",
    "    print(f\"Score: {node.score}\")  # Similarity score\n",
    "    print(f\"Content: {node.node.text}\")  # The actual text content"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
